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Abstract 


This  paper  presents  a  new  sensor  placement  measure  called  re  solvability.  This  measure  provides  a 
technique  for  estimating  the  relative  ability  of  various  sensor  systems,  including  single  camera 
systems,  stereo  pairs,  multi-baseline  stereo  systems,  and  3D  rangefinders,  to  accurately  control 
visually  manipulated  objects.  The  measure  also  indicates  the  capability  of  a  visual  sensor  to  pro¬ 
vide  spatially  accurate  data  on  objects  of  interest.  The  term  resolvability  refers  to  the  ability  of  a 
visual  sensor  to  resolve  object  positions  and  orientations.  Our  main  interest  in  resolvability  is  in 
determining  the  accuracy  with  which  a  manipulator  being  observed  by  a  camera  can  visually 
servo  an  object  to  a  goal  position  and  orientation.  The  resolvability  ellipsoid  is  introduced  to  illus¬ 
trate  the  directional  nature  of  resolvability,  and  can  be  used  to  direct  camera  motion  and  adjust 
camera  intrinsic  parameters  in  real-time  so  that  the  servoing  accuracy  of  the  visual  servoing  sys¬ 
tem  improves  with  camera-lens  motion.  The  Jacobian  mapping  from  task  space  to  sensor  space  is 
derived  for  a  single  camera  system,  a  stereo  pair  with  parallel  optical  axes,  and  a  stereo  pair  with 
perpendicular  optical  axes.  Resolvability  ellipsoids  based  on  these  mappings  for  various  sensor 
configurations  are  presented.  Visual  servoing  experiments  demonstrate  that  resolvability  can  be 
used  to  direct  camera-lens  motion  in  order  to  increase  the  ability  of  a  visually  servoed  manipula¬ 
tor  to  precisely  servo  objects. 
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1.  Introduction 


In  order  to  effectively  use  visual  feedback  to  perform  robotic  tasks,  many  researchers  have  recog¬ 
nized  that  the  placement  of  the  sensor  relative  to  the  task  is  an  important  consideration  [2],  [12], 
and  [13].  Active  cameras-lens  systems  that  possess  several  extrinsic  as  well  as  intrinsic  variable 
parameters  provide  a  high  degree  of  flexibility  in  providing  information  about  the  task  by  allowing 
real-time  control  over  sensor  placement.  This  flexibility,  however,  provides  challenges  in  effective 
real-time  camera-lens  control.  A  framework  for  stably  controlling  various  extrinsic  camera  param¬ 
eters  in  real-time  in  order  to  optimize  various  sensor  placement  criteria  has  been  previously  pro¬ 
posed  and  experimentally  verified  [8].  This  paper  presents  a  new  sensor  placement  measure  called 
resolvability,  which  provides  a  technique  for  estimating  the  relative  ability  of  various  visual  sensor 
systems,  including  single  camera  systems,  stereo  pairs,  multi-baseline  stereo  systems,  and  3D 
rangefinders,  to  accurately  control  visually  manipulated  objects  and  to  provide  spatially  accurate 
data  on  objects  of  interest. 

The  term  resolvability  refers  to  the  ability  of  a  visual  sensor  to  resolve  object  positions  and  orien¬ 
tations.  For  example,  a  typical  single  camera  system  has  the  ability  to  accurately  resolve  object  lo¬ 
cations  that  lie  in  a  plane  parallel  to  the  image  plane,  but  can  less  accurately  resolve  object  depth 
based  on  the  projection  of  object  features  on  the  image  plane.  Similarly,  rotations  within  planes 
parallel  to  the  image  plane  can  be  more  accurately  resolved  than  rotations  in  planes  perpendicular 
to  the  image  plane.  The  degree  of  resolvability  is  dependent  on  many  factors.  For  example,  depth, 
focal  length,  number  of  features  tracked  and  their  image  plane  coordinates,  position  and  orientation 
of  the  camera,  and  relative  positions  and  orientations  of  multiple  cameras,  all  effect  the  magnitudes 
and  directions  of  resolvability.  Due  to  the  difficulty  in  understanding  the  multi-dimensional  nature 
of  resolvability,  we  propose  the  resolvability  ellipsoid  as  a  geometrical  representation  of  the  ability 
of  different  visual  sensor  configurations  to  resolve  object  positions  and  orientations. 

Resolvability  can  be  applied  to  both  visual  servoing  and  sensor  planning.  Our  main  interest  is  in 
determining  the  accuracy  with  which  a  manipulator  being  observed  by  a  camera  can  visually  servo 
an  object  to  a  goal  position  and  orientation.  The  resolvability  ellipsoid  can  be  used  to  direct  camera 
motion  and  adjust  camera  intrinsic  parameters  in  real-time  so  that  the  servoing  accuracy  of  the  vi¬ 
sual  servoing  system  improves  with  camera-lens  motion.  A  second  use  of  resolvability  is  as  an  aid 
in  determining  static  camera  placement  for  either  object  recognition  or  for  visual  servoing. 

Sensor  resolution  has  been  considered  in  the  past  as  a  criterion  for  sensor  planning  [1],  [2],  and 
[11].  These  efforts  concern  static  camera  systems  in  which  a  required  spatial  resolution  is  known 
and  a  single  camera  placement  is  desired.  In  [3],  a  study  of  stereo,  vergence,  and  focus  cues  for 
determining  range  is  described  in  which  the  performance  of  each  cue  for  determining  range  accu¬ 
racy  is  characterized.  This  characterization  can  be  used  to  control  camera  parameters  in  order  to 
improve  the  accuracy  of  range  estimates.  Our  resolvability  approach  can  be  used  for  determining 
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the  ability  of  a  visually  servoed  manipulator  to  accurately  resolve  positions  and  orientations  of  ob¬ 
jects  along  all  six  degrees  of  freedom.  Camera-lens  intrinsic  and  extrinsic  parameters  can  be  ac¬ 
tively  controlled  using  a  resolvability  measure  in  conjunction  with  other  sensor  placement  criteria 
so  that  the  accuracy  of  visual  control  can  be  improved.  The  concept  can  also  be  used  for  static  sen¬ 
sor  placement  for  either  object  recognition  or  visual  servoing.  This  application  parallels  the  use  of 
manipulability  in  determining  the  optimal  placement  of  assembly  tasks  in  a  manipulator’s  work¬ 
space  [7]. 

There  are  many  parallels  between  resolvability  and  the  concept  of  manipulability  as  proposed  in 
[14].  Both  concepts  use  an  elliptical  representation  of  the  singular  value  decomposition  of  the  Ja¬ 
cobian  of  the  mapping  between  sensor  space  and  task  space.  An  important  difference  is  that  a  high 
manipulability  means  that  relatively  small  joint  motions  translate  into  a  relatively  large  endeffector 
motions.  A  high  resolvability,  on  the  other  hand,  means  that  a  small  object  displacement  translates 
into  a  relatively  large  displacement  in  sensor  space.  A  high  resolvability  also  indicates  that  a  small 
object  velocity  projects  a  large  optical  flow  onto  the  image  plane.  In  fact,  our  concept  of  resolv¬ 
ability  more  closely  parallels  Ghosal  and  Roth’s  transmission  ratio  [4].  Resolvability  can  also  be 
viewed  as  a  measure  of  the  sensitivity  of  the  sensor  to  displacements  of  the  object  of  interest  along 
particular  directions  in  the  task  space. 

In  this  paper,  we  first  derive  a  mapping  from  task  space  to  sensor  space,  which  is  the  first  step  in 
determining  the  resolvability  of  a  camera-lens  system.  We  then  briefly  describe  the  singular  value 
decomposition  and  its  ellipsoidal  representation  for  resolvability.  Next,  different  resolvability  el¬ 
lipsoids  for  various  camera-lens-task  configurations  are  compared.  The  resolvability  ellipsoids  of 
a  stereo  pair  with  parallel  optical  axes  and  a  stereo  pair  with  perpendicular  optical  axes  are  present¬ 
ed  as  well.  A  technique  for  using  the  resolvability  ellipsoid  to  direct  camera  motion  is  proposed.  A 
section  on  experimental  results  briefly  describes  our  visual  servoing  hardware,  our  visual  tracking 
strategy,  and  results  from  visual  servoing  using  different  camera  configurations.  These  results 
demonstrate  that  resolvability  predicts  the  ability  of  the  system  to  accurately  servo  under  different 
camera-lens  configurations. 

2.  Task  Space  Sensor  Space 

Resolvability  depends  on  the  Jacobian  of  the  mapping  from  task  space  to  sensor  space.  We  desire 
a  matrix  form  of  the  Jacobian  which  contains  both  intrinsic  and  extrinsic  sensor  parameters,  in  or¬ 
der  to  analyze  the  effects  of  changing  these  parameters  on  the  structure  of  the  Jacobian.  For  any 
visual  sensor  system,  we  desire  an  equation  of  the  form 

Xs  =  Ji^)XT  (1) 

where  Xs  is  a  velocity  vector  in  sensor  space,  is  the  Jacobian  matrix  and  is  a  function  of  the 
extrinsic  and  intrinsic  parameters  of  the  visual  sensor  as  well  as  the  number  of  features  tracked  and 
their  locations  on  the  image  plane,  and  Xj-  is  a  velocity  vector  in  task  space. 
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2.1  Camera  Model 


The  mapping  from  task  space  to  sensor  space  for  any  system  using  a  camera  as  the  visual  sensor 
requires  a  camera- lens  model  in  order  to  represent  the  projection  of  task  objects  onto  the  CCD  im¬ 
age  plan.  For  visual  servoing,  a  simple  pin  hole  camera  model  has  proven  adequate  for  visual  track¬ 
ing  using  our  experimental  setup  [8].  If  we  place  the  camera  coordinate  frame  { C }  at  the  focal  point 
of  the  lens  as  shown  in  Figure  1,  a  feature  on  an  object  at  with  coordinates  {Xq  Yq  Zq)  in  the 
camera  frame  projects  onto  the  camera’s  image  plane  at 


f^c  ^ 
s^Zc  ^ 


where  (Xj.y,)  are  the  image  coordinates  of  the  feature, /is  the  focal  length  of  the  lens,  5^^  and  Sy  are 
the  horizontal  and  vertical  dimensions  of  the  pixels  on  the  CCD  array,  and  {Xp,yp)  is  the  piercing 
point  of  the  optical  axis  on  the  CCD.  This  model  assumes  that  Zq  » /. 


The  mapping  from  camera  frame  feature  velocity  to  image  plane  optical  flow,  or  sensor  space  ve¬ 
locity,  can  be  obtained  simply  by  differentiating  (2)  and  (3).  This  yields  the  following  equations 


fXc  fXcZc  ^  ^ 

SjZc  Sj^Zc 


(4) 


fYc  fYcZc  ^  ^ 

SyZc  ^yZc  Zq 


(5) 


where  x,  =  x,  - Xp  and  =  yi-yp-  The  mapping  from  the  camera  frame  onto  the  image  plane  is 
now  defined.  The  next  step  is  to  transform  task  space  velocities  into  the  camera  frame,  and  then 
project  these  camera  frame  velocities  onto  the  sensor  space  to  obtain  the  mapping  from  task  space 
velocity  to  sensor  space  velocity. 


Figure  1:  Pinhole  camera  model. 
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2.2  Objects  Defined  in  a  Task  Frame 

For  visually  servoing  a  manipulator  holding  an  object,  the  objective  is  to  move  the  image  coordi¬ 
nates  of  to  some  location  on  the  image  plane  by  controlling  the  motion  of  ^P.  Typically,  ^P  is 
some  feature  on  an  object  being  held  by  a  manipulator.  Thus,  the  motion  of  ^P  is  induced  relative 
to  the  tool  frame  of  the  manipulator  being  observed.  Figure  2  shows  the  coordinate  systems  used 
to  define  the  mapping  from  task  space  to  sensor  space  for  ^P  with  coordinates  in  the  task  frame  of 
{XjYjZj).  For  now,  we  assume  that  the  rotation  of  the  task  frame  {T}  with  respect  to  {C}  jU  is 
known.  The  velocity  of  ^P  can  be  written  as 

^  ( V-i-  P  +  X  >)  (6) 

where  V  =  and  (0,,  (O.J  are  the  translational  and  rotational  velocities  of  the 

task  frame  with  respect  to  itself,  respectively.  These  are  manipulator  endeffector  velocities  that  can 
be  commanded.  Since  the  object  being  servoed  or  observed  is  rigidly  attached  to  the  task  frame, 

T  . 

P  =  0,  and  (6)  becomes 

Furthermore,  if  we  assume  that  {C}  and  {T}  are  aligned,  as  shown  in  Figure  2,  then  ^  /  and 

the  elements  of  P  can  be  written  as 

dXr 

■y  =  Xr  +  Zr®,  -  TrO), 

dt  ^  ^  >r  ^  z, 

dYc  .  ^ 

^  -  yT-Zj-ca^^  +  X-rdi^^  (8) 

dZr 

^  =  Zr  + 

The  assumption  that  {C}  and  {T}  are  aligned  is  only  used  in  formulating  the  Jacobian  from  task 
space  to  sensor  space.  If  the  transformation  from  task  space  to  sensor  space  is  initially  known,  and 
the  commanded  task  frame  velocity  is  known,  then  the  coordinates  {Xj  YjZj)  can  be  appropriately 
updated  while  visual  servoing.  It  will  also  be  necessary  to  account  for  task  frame  rotations  when 
determining  the  velocity  to  command  the  task  frame  based  on  =  [jCry^z^  and 


Figure  2:  Task  frame  and  camera  frame  definitions. 
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cOjj  .  It  would  have  been  possible  to  include  the  terms  of  ^  in  (8),  however,  the 
assumption  made  simplifies  the  derivation  and  does  not  affect  the  end  result. 

By  combining  (8)  with  (4)  and  (5),  the  entire  Jacobian  transformation  for  a  single  feature  from  task 
space  to  sensor  space  can  now  be  written  in  the  form 


0  -I 


f^T  ^ 


0  /  ys 

Zq  _s^Z(^ 


f^T  .  ^ rys 


s.Zc  0) 


For  the  above  form  of  the  Jacobian,  the  parameters  of  the  Jacobian  are  given  by 
<|)  =  (/,  s^,  Sy,  Xs,  ys,  Zq,  Xy,  Yr,  Zj)  .  Alternatively,  the  sensor  coordinates  may  be  omitted  and  re¬ 
placed  with  camera  frame  coordinates  to  arrive  at  a  Jacobian  of  the  form 


0  /  fYc  \f2T  ,/ycy, 

s,Zc  s,Zi  S,Z\ 


fZ?  P^C^T 

SyZ\  ^ 


^.v2c  “v. 


where  the  parameters  are  now  <{)=(/,  s„  Sy,  Xc,  Yc,  Z^,  Xj,  Yf,  Zj) .  Either  form  may  be  desirable 
depending  on  the  design  parameters  desired  for  determining  sensor  placement. 

Generally,  several  features  on  an  object  are  tracked.  For  n  feature  points,  the  Jacobian  is  of  the  form 

U,1 


where  7/  is  the  Jacobian  matrix  for  each  feature  given  by  the  2x6  matrix  in  (9)  or  (10). 

2.3  Objects  Defined  in  the  Camera  Frame 

For  eye-in-hand  tracking,  it  may  be  preferable  to  define  the  task  frame  to  be  equivalent  to  the  cam¬ 
era  frame.  This  simplifies  the  derivation  of  the  task  frame  to  sensor  frame  Jacobian.  The  point  be¬ 
ing  observed  is  observed  by  translating  and  rotating  the  camera.  Therefore,  (6)  becomes 


(12) 
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where  =  [jcc  and  ©.J  are  the  translational  and  rotational  velocities  of 

the  camera  with  respect  to  the  current  camera  frame.  By  combining  (4)  and  (5)  with  (12)  as  in  the 
previous  derivation,  one  can  arrive  at  the  following  form  for  the  Jacobian 


3.  Ellipsoidal  Representations  for  Resolvability 

The  ability  of  a  visual  sensor  to  resolve  task  positions  and  orientations  is,  of  course,  directionally 
dependent.  By  performing  a  singular  value  decomposition  on  the  task  space  to  sensor  space  Jaco¬ 
bian,  and  analyzing  the  singular  values  and  the  eigenvectors  of  J^J  which  result  from  the  decom¬ 
position,  the  directional  properties  of  the  ability  of  the  sensor  to  resolve  positions  and  orientations 
becomes  apparent. 

Singular  value  decomposition  is  a  common  technique  used  for  analyzing  the  range  and  null  space 
of  a  matrix  transformation.  Details  concerning  the  SVD  can  be  found  in  [5].  The  SVD  of  a  matrix 
A  is  given  by 

A  = 

where  Z  is  a  diagonal  matrix  containing  the  square  roots  of  the  eigenvalues  ofA^A  and  A<4^,  also 
called  the  singular  values  of  A,  U  contains  the  eigenvectors  of  AA^,  and  V  contains  the  eigenvec¬ 
tors  of  A^A.  For  resolvability  the  eigenvectors  ofJ^J  are  those  in  which  we  are  interested,  because 
these  eigenvectors  give  us  a  set  of  basis  vectors  for  the  row  space  of  J,  which  is  also  the  vector 
space  described  by  9?(y  ),  the  range  of  J  .  These  basis  vectors  combined  with  their  corresponding 
singular  values  indicate  the  directionality  of  the  ability  of  the  camera-lens  system  to  resolve  posi¬ 
tions  and  orientations  in  task  space.  Conversely,  the  eigenvectors  of  JJ^  tell  us  the  effect  of  task 
space  object  motion  in  sensor  space. 

In  order  to  use  an  ellipsoidal  representation  of  resolvability,  we  first  assume  that  the  object  of  in¬ 
terest  has  an  equal  ability  to  translate  and  rotate  about  all  of  its  cartesian  axes.  Furthermore,  we 
assume  that  the  velocity  of  the  object  is  constrained  to  fall  within  some  six  dimensional  spheroid, 
such  that 

UtW  =  (xl  +  fr+zl+(sil^+(iil^+(oly 

and  ||jCr||  <  1 .  Under  these  assumptions  the  principal  axes  of  the  ellipsoid  representing  the  ability 
of  J  to  resolve  positions  and  orientations  in  task  space  are  given  by  OiV],  02*’2’  ®3*'3’  ^5^5^ 
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and  where  a,  is  the  /th  singular  value  of  J  and  v,  is  the  /th  eigenvector  of  J^J.  In  the  next 
section  we  present  ellipsoids  for  various  camera-lens-object  configurations. 


4.  Resolvability  Ellipsoids  for  a  Single  Camera  System 


The  resolvability  ellipsoid  makes  the  directional  nature  of  a  visual  sensor’s  ability  to  resolve  posi¬ 
tions  and  orientations  apparent.  In  this  section,  the  resolvability  ellipsoid  for  a  simple  single  camera 
system  is  illustrated.  The  extrinsic  parameters  that  can  be  varied  include  the  six  degrees  of  freedom 
of  the  camera  position  and  orientation.  The  intrinsic  parameter  that  can  vary  is  the  focal  length. 
Changing  the  position  or  orientation  of  the  camera  also  has  the  effect  of  changing  the  coordinates 
of  the  features  being  tracked  in  sensor  space.  The  effect  of  including  more  features  in  ihe  tracking 
system  is  also  shown. 


Displaying  a  six-dimensional  ellipsoid  is  somewhat  problematic,  so  the  mapping  described  by  ( 10) 
has  been  decomposed  into  two  mappings,  one  representing  translational  components  and  one  rep¬ 
resenting  rotational  components.  The  mappings  are  described  by 


'  f 

1 

o 

— 

Xs 

ys 

0 

/  _fyc 

s,Z\ 

yr 

pT 

Aci'r  1 

'  fZj  ^Pic^T~ 

fTr 

to 

o 

N 

H 

1 

J,Zc  ■  s,Z\ 

■"r 

tOy 

ryZr  . 

fYcXr 

fXr 

■'r 

03. 

L  ‘d 

s,Z\ 

■^yZc 

(16) 


(17) 


Figure  3  shows  the  components  of  the  resolvability  ellipsoid  when  a  feature  on  an  object  is  tracked. 
The  translational  components  from  (16)  are  shown  by  the  ellipsoid  on  the  left,  and  the  rotational 
components  from  (17)  are  shown  to  the  right.  The  object  is  Im  from  the  camera  frame,/is  12mm, 
and  the  feature  is  located  in  the  task  frame  at  (0.1m,0,0).  The  pixel  dimensions  used  are  ij^=l  Ipm 
and  ^y=13|im.  The  two-dimensional  ellipse  that  lies  in  a  plane  approximately  parallel  to  the  image 
plane  indicates  that  depth  cannot  be  resolved  since  9I(J  >  has  no  basis  vector  which  can  describe 
motion  along  that  direction.  Since  the  feature  lies  on  the  axis  of  the  image  plane,  rotations  about 
XT’ cannot  be  resolved,  vever,  rotational  motion  about  Ty-and  Zycan  be  observed.  In  Figure  4  a 
second  featu'c  is  added  at  v,-0.1m,0,0).  The  ability  to  resolve  positions  inXj-and  Tyincreases  with 
another  independent  data  point,  and  differences  in  depth  can  now  be  resolved  as  the  three  dimen¬ 
sional  shape  of  the  ellipsoic'  indicates,  although,  depth  resolution  is  much  poorer  than  Xy  and  Ty 
resolution.  Rotations  about  Xy  can  still  not  be  resolved,  however,  in  Figure  5  the  coordinates  are 
moved  to  (0.  lm,0.  lm,0)  and  (-0.  lm,0.  lm,0),  and  the  rotational  ellipsoid  becomes  three  dimension¬ 
al,  indicating  that  rotations  about  all  axes  can  be  resolved.  It  is  important  to  note  that  translations 
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Figure  4:  Resolvability  EU^soids:  single  camera  system,^12mm,  depth=1.0m, 
2  features  located  in  the  task  frame  at  (0.1m,Q,0)  and  (•0.1m,0,0). 


and  rotations  are  to  be  resolved  independently  using  (16)  and  (17).  If  an  SVD  of  (10)  is  performed, 
only  four  singular  values  would  be  non-zero  indicating  that  translational  and/or  rotational  motion 
about  two  axes  cannot  be  resolved  independent  of  the  other  axes. 

In  Figure  6,  the  depth  to  the  object  is  0.5m  and /is  12mm.  In  Figure  7,  the  depth  is  1.0m,  and/is 
24mm.  It  is  interesting  to  note  that  the  image  plane  coordinates  for  both  ellipsoids  are  the  same, 
but  the  smaller  depth  results  in  a  resolvability  that  is  approximately  twice  the  resolvability  of  the 
example  in  which  the  focal  length  is  doubled.  This  indicates  that  for  a  given  magnification  of  the 
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object  (f/Zc)  reducing  depth  is  preferable  to  increasing  focal  length  when  trying  to  resolve  depth 
using  a  single  camera. 

Figure  8  is  a  plot  of  resolvability  in  depth  versus  focal  length  and  depth  of  the  object  in  the  camera 
frame.  The  plot  also  shows  the  boundary  of  the  image  plane,  beyond  which  the  feature  projections 
do  not  fall  on  the  CCD  of  the  camera.  Pixel  coordinates  may  vary  along  xs  from  -255  to  256,  and 
along  ys  from  -239  to  240.  For  the  graph,  an  object  with  two  features  is  observed.  The  task  frame 
coordinates  of  the  features  are  (0.05m,0,0)  and  (-0.05m, 0,0).  The  graph  shows  the  relationship  be- 


Figure  5:  Resolvability  Ellipsoids:  single  camera  system, /=12nun,  depth=1.0m,  2 
features  located  in  the  task  frame  at  (0.1m, 0.1m, 0),  (-0.1m,0.1m,0). 


Figure  6:  ResolvabilUy  Ellipsoids:  sin^e  camera  system,y^l2nun,  depth=0.Sm,  2 
features  located  in  the  task  frame  at  (0.1m,0.1m,0)  and  (•0.1m,0.1m,0). 


Figure  7:  ResolvabilUy  Ellipsoids:  single  camera  system, /^24mm,  depth=1.0m,  2 
features  located  in  the  task  frame  at  (0.1m,0.1m,0)  and  (-0.1m, 0.1m, 0). 


Figure  8:  Resolvability  of  depth  versus  depth  of  object  and  focal  length  for  two 
features  located  in  the  task  frame  at  (0.05m,0,0)  and  (•0.05m,0,0). 

tween  depth,  focal  length,  and  resolvability  in  depth.  From  the  plot  one  can  observe  that  progres¬ 
sively  smaller  depths  have  progressively  larger  effects  on  resolvability  in  depth,  while  focal  length 
tends  to  effect  depth  resolvability  more  linearly.  In  practice,  depth  becomes  limited  by  the  depth- 
of-field  of  the  lens,  and  a  trade-off  must  be  made  between  focal  length,  depth,  depth-of-field,  and 
field-of-view. 


The  position  of  object  features  on  the  image  plane  effect  resolvability  as  well.  In  Figure  9,  a  planar 
object  with  four  features  is  moved  across  the  image  plane  and  the  ability  of  the  camera-lens  system 
to  resolve  different  orientations  of  the  object  are  plotted.  The  plot  illustrates  that  the  ability  to  re- 
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Figure  9:  Resolvability  in  orientation  about  Z  versus  center  of  object  projection  onto  the  image  plane. 

solve  orientations  about  the  optical  axis  increases  with  the  distance  of  the  object  from  the  optical 
axis,  as  long  as  the  features  projections  remain  on  the  image  plane. 

5.  Resolvability  Ellipsoids  for  Multiple  Camera  Systems 


5.1  Stereo  Pair  •  Parallel  Optical  Axes 

In  this  section,  the  Jacobian  for  a  stereo  pair  with  parallel  optical  axes  observing  an  object  de¬ 
scribed  relative  to  a  task  frame  is  derived.  The  derivation  is  based  on  equations  for  a  stereo  eye-in¬ 
hand  system  given  in  [6].  The  term  b  represents  the  length  of  the  baseline  of  the  cameras,  which  is 
the  line  segment  between  camera  focal  points.  The  origin  of  the  camera  frame  lies  on  the  baseline 
midway  between  focal  points,  with  the  -Z  axis  pointing  towards  the  object  task  frame,  as  shown  in 
Figure  10.  A  feature  on  an  object  at  with  coordinates  {Xq  Yq  Z^)  in  the  camera  frame  projects 
onto  each  image  plane  at 


fiXc+^) 


s.Zc 


ysi  = 


SyZc 


YSr  = 


El 

SyZc 


(18) 


where  it  is  assumed  that/,  s^,  and  Sy  are  the  same  for  both  cameras. 


(19) 
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Figure  10:  Coordinate  frames  for  a  stereo  pair  with  parallel  axes. 


Through  a  similar  derivation  as  the  one  presented  in  Section  2.2  (and  with  the  same  assumption 
that  the  camera  and  task  frames  are  aligned),  the  relation  between  task  space  velocity  and  sensor 
space  velocity  can  be  written  as 
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(20) 


Equations  (18)-(19)  can  be  used  to  obtain  this  Jacobian  in  terms  of  the  sensor  coordinates  and  task 
coordinates,  alone.  Camera  coordinates,  including  the  depth  Zq,  are  not  needed.  We  assume  task 
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coordinates  are  known  for  visual  servoing,  since  we  assume  the  object  we  are  servoing  on  is  of  a 
known  size  and  shape.  The  form  for  the  Jacobian  is 
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where  d=xspxsr  is  the  disparity  of  each  corresponding  feature  point.  This  form  does  not  require  an 
explicit  estimate  of  the  depth  Z^. 

The  resolvability  ellipsoid  for  this  system  is  very  similar  to  the  previous  ellipsoids  for  the  monoc¬ 
ular  system  in  Section  4.  The  resolvability  in  depth  for  a  stereo  system  with  a  baseline  b  is  similar 
to  the  depth  resolvability  for  a  monocular  system  tracking  two  features  separated  in  the  task  frame 
Xj-Yj  plane  by  a  length  b.  A  comparison  of  Figure  11  with  Figured  illustrates  this.  Figure  12 
shows  the  ellipsoids  when  tracking  two  features  in  each  image.  Figure  13  is  a  plot  of  resolvability 
in  depth  for  the  stereo  system  versus  b  and  depth  when  tracking  a  single  feature  in  both  images. 
This  plot  shows  that  the  effect  of  baseline  length  b  on  depth  resolvability  is  very  similar  to  the  ef¬ 
fect  of  focal  length  on  depth  resolvability  as  shown  in  Figure  8.  As  one  would  expect,  the  plot 
shows  that  a  high  depth  resolvability  is  more  easily  achieved  with  small  depths,  rather  than  large 
baselines,  subject  to  the  boundary  of  the  image  plane. 

5.2  Stereo  Pair  -  Orthogonal  Optical  Axes 

An  orthogonal  stereo  pair  is  shown  in  Figure  14.  If  the  axes  are  aligned  as  shown  in  the  figure,  the 
Jacobian  mapping  from  task  space  to  sensor  space  can  be  written  as 
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6=20cin,  depthsl.Om,  1  feature  located  in  the  task  frame  at  (0,0Jlm,0). 


Figure  12:  Resolvability  Ellipsoids :  stereo  pair-parallel  opticarues,/=12mm,  6=20cm,  depth=1.0m, 
2  features  located  in  the  task  frame  at  (0,0.2m, 0)  and  (0,-0.2m,0). 
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Figure  13:  Resolvability  in  depth  versus  baseline  length  and  depth  of  object  for  a  stereo  pair,  parallel 
optical  axes,^12nini,  and  a  single  feature  located  at  the  origin  of  the  task  frame. 


-Zt  Tp-.{Xt,Yt,Zt) 


Figure  14:  Coordinate  frames  for  a  stereo  pair  with  orthogonal  optical  axes. 
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By  rewriting  this  Jacobian  in  a  form  which  uses  sensor  coordinates,  the  Jacobian  becomes 
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A  feature  tracked  in  each  image  gives  a  very  well  conditioned  translational  ellipsoid,  as  shown  in 
Figure  15.  When  tracking  two  features  in  each  camera,  the  rotational  ellipsoid  shown  in  Figure  16 
is  representative  of  the  resolvability  of  the  camera-lens  arrangement. 

6.  Directing  Camera-Lens  Motion  Using  the  Resolvability  Ellipsoid 

Our  primary  interest  in  resolvability  is  in  using  it  to  actively  guide  camera-lens  motion  for  a  single 
camera  while  performing  visually  servoed  manipulation  tasks.  Static  sensor  placement  algorithms 
can  also  use  the  directional  qualities  of  the  resolvability  ellipsoid  to  direct  the  search  for  an  optimal 
placement  of  the  visual  sensor.  In  order  to  guide  camera-lens  motion,  it  must  be  determined  which 
camera-lens  parameters  can  be  freely  changed.  A  gradient  in  parameter  space  can  be  determined 
based  on  these  free  parameters  which  directs  camera-lens  motion  towards  configurations  with  im- 


depthsl.Om,  1  features  located  in  the  task  frame  at  (0.1m, 0.1m, 0). 
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Figure  16:  Resolvability  Ellipsoids',  stereo  pair-perpendicular  optical  axes,  /=12nini,  depth=1.0m,  2 


features  located  in  the  task  frame  at  (-0.1m, 0.1m, 0),  and  (0.1m,-0.1m,-0.1m). 
proved  resolvability.  For  monocular  and  stereo  tracking,  resolution  along  the  optical  axis  often  has 
the  poorest  resolvability.  In  order  to  determine  motions  which  will  increase  resolvability  along  the 
optical  axis,  the  gradient  of  with  respect  to  the  camera-lens  parameter  space  can  be  calculated, 
where  represents  the  singular  value  which  corresponds  to  the  eigenvector  along  the  optical  axis. 
The  gradient  is  a  function  of  the  intrinsic  and  extrinsic  parameters  of  the  camera  that  can  vary,  as 
well  as  the  location  of  the  features  on  the  image  plane.  The  gradient  can  be  written  as 


V«o  = 

dxc  dye  dze  dr^^  dry^  dr^^  df 


The  individual  components  of  V  ^  calculated  numerically.  Camera-lens  motion  can  then 

be  directed  along  V  order  to  increase  subject  to  other  sensor  placement  criteria  such  as 
depth-of-field  and  field-of-view  constraints  which  simultaneously  effect  camera-lens  motion.  A 
technique  for  integrating  this  gradient  into  the  visual  tracking  control  law  can  be  found  in  [8]. 

7.  Results 

7.1  Experimental  Setup 

To  experimentally  demonstrate  the  implications  of  resolvability,  a  manipulator  was  visually  servo- 
ed  by  an  active  camera-lens  system  at  various  depths  and  focal  lengths.  Comparisons  of  the  perfor¬ 
mance  of  the  controller  at  different  camera-lens  configurations  are  presented.  The  experimental 
setup  is  shown  in  Figure  17.  Visual  servoing  algorithms  have  been  implemented  on  a  robotic  as¬ 
sembly  system  consisting  of  three  Puma  560’ s  called  the  Troikabot.  One  of  the  Pumas  has  a  Sony 
XC-77RR  camera  with  a  zoom  lens  mounted  at  its  endeffector.  The  camera  is  connected  to  a 
Datacube  Maxtower  Vision  System.The  Pumas  are  controlled  from  a  VME  bus  with  two  Ironies 
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Figure  17:  Experimental  Setup. 


IV-3230  (68030  CPU)  processors,  an  rV-3220  (68020  CPU)  which  also  communicates  with  a 
trackball,  a  Mercury  floating  point  processor,  and  a  Xycom  parallel  I/O  board  communicating  with 
three  Lord  force  sensors  mounted  on  the  Pumas’  wrists.  All  processors  on  the  controller  VME  run 
the  Chimera3  real-time  operating  system  [10].  A  diagram  of  the  hardware  setup  is  given  in 
Figure  18.  The  vision  system  VME  communicates  with  the  controller  VME  using  BIT3  VME-to- 
VME  adapters.  The  Datacube  Maxtower  Vision  System  calculates  the  optical  flow  of  the  features 
using  a  Sum-of-Squares  Differences  algorithm.  A  special  high  performance  floating-  point  proces¬ 
sor  on  the  Datacube  is  used  to  calculate  the  optical  flow  of  the  feature,  and  a  68030  board,  also  on 
the  vision  system,  computes  the  control  inputs.  An  image  can  be  grabbed  and  displacements  for  up 
to  ten  16x16  feature  templates  in  the  scene  can  be  determined  at  30Hz.  The  control  input  for  track¬ 
ing  is  sent  to  the  Mercury  floating  point  processor  at  30Hz,  where  a  cartesian  controller  calculates 
the  proper  joint  control  conunands. 

7.2  Visual  IVacking  Controller 

The  state  equation  for  the  visual  servoing  system  is  created  by  discretizing  (9)  and  rewriting  the 
discretized  equation  as 


x{k  -t- 1)  =  Ax{k)  -H  TJ{k)u{k) 


(25) 
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Figure  18:  The  lYvikabot  System  arehitecture. 

where  M  is  the  number  of  features  being  tracked,  A=12m,  xik)e  T  is  the  sampling  period  of 
the  vision  system,  and  u(,k)  -  jir  >>  Zt  ,  the  manipulator  endeffector  velocity. 

A  control  strategy  can  be  derived  using  the  controlled  active  vision  paradigm  [9].  The  control  ob¬ 
jective  of  the  visual  tracking  system  is  to  control  endeffector  motion  in  order  to  place  the  image 
plane  coordinates  of  features  on  the  target  at  some  desired  position.  The  desired  image  plane  coor¬ 
dinates  could  be  constant  or  changing  with  time.  The  control  strategy  used  to  achieve  the  control 
objective  is  based  on  the  minimization  of  an  objective  function  at  each  time  instant.  The  objective 
function  places  a  cost  on  differences  in  feature  positions  from  desired  positions,  as  well  as  a  cost 
on  providing  control  input,  and  is  of  the  form 

F{k+  1)  =  [x(k+  l)-Xoik+  l)]^G[x(k+  l)-Xo(k+  1)]  +u\k)Ru(k) 


(26) 
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This  expression  is  minimized  with  respect  to  the  current  control  input  uik).  The  end  result  yields 
the  following  expression  for  the  control  input 

U(k)  =  -  U\mm  +  R)  ''j\k)Q  [x(it)  -  x^{k  +  1 )]  (27) 

The  weighting  matrices  Q  and  R  allow  the  user  to  place  more  or  less  emphasis  on  the  feature  error 
and  the  control  input.  Their  selection  effects  the  response  and  stability  of  the  tracking  system.  The 
Q  matrix  must  be  positive  semi-defmite,  and  R  must  be  positive  definite  for  a  bounded  response. 
Although  no  standard  procedure  exists  for  choosing  the  elements  of  Q  and  R,  general  guidelines 
can  be  found  in  [9]. 

For  the  experiments  performed,  only  the  translational  resolvability  components  were  considered, 
therefore,  the  Jacobian /given  by  (16)  is  used  in  the  control  law  (27)  and  «(it)  =  • 

7.3  Experimental  Results 

In  order  to  demonstrate  the  implications  of  resolvability  for  visual  servoing,  an  object  was  placed 
in  the  gripper  of  the  manipulator  such  that  two  features  on  the  object  located  approximately  4cm 
apart  fell  in  a  plane  parallel  to  the  image  plane  of  the  camera.  The  desired  coordinates  of  the  image 
plane  projection  of  the  features  were  commanded  such  that  the  distance  between  the  features 
should  decrease  by  10  pixels,  but  the  center  of  mass  of  the  features  should  remain  constant. 
Figure  19  illustrates  how  the  object  might  initially  appear  on  the  image  plane  and  how  the  object 
should  appear  after  the  desired  locations  of  the  features  coordinates  have  been  achieved.  In  order 


Initial  feature  locations  Final  desired  feature  locations 

Figure  19:  Images  showing  initiai  and  desired  iocations  of  feature 
template  locations  for  the  experiments  performed. 


for  the  tracking  controller  to  command  the  manipulator  to  move  the  image  plane  coordinates  of  the 
features  to  their  desired  locations,  the  manipulator  must  move  parallel  to  the  optical  axis  of  the 
camera.  This  is  the  poorest  direction  of  translational  resolvability  for  a  single  camera  system.  The 
change  m  the  depth  of  the  object  with  respect  to  the  camera  frame  was  recorded  as  a  function  of 
time  and  results  are  plotted  in  Figures  20  and  2 1 .  The  greater  the  depth  resolvability  of  a  given  cam¬ 
era-lens  configuration,  the  smaller  the  required  translation  of  the  object  to  effect  a  10  pixel  change 
in  object  length  on  the  image  plane. 
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Figure  20:  Change  in  object  depth  required  to  effect  a  10  pixel  change  in  projected  object  length  for  three 

different  initial  depths  and  a  focal  length  of  ISnun. 

In  Figure  20  the  change  in  depth  of  the  object  is  plotted  versus  time  for  three  trials  in  which  the 
focal  length  of  the  camera  is  held  constant  at  15mm  and  the  initial  depth  of  the  object  with  respect 
to  the  camera  varies  from  50cm  to  Im.  The  increase  in  resolvability  with  decreasing  depth  is  made 
apparent  by  the  smaller  changes  in  depth  required  to  move  the  object  so  that  the  length  between 
features  on  the  image  plane  decreases  by  10  pixels.  At  Im,  the  object  must  translate  19.6cm  in  or¬ 
der  to  effect  a  10  pixel  change  on  the  image  plane.  At  75cm,  only  1 1.5cm  of  translational  motion 
is  required,  while  at  50cm  the  object  moves  7.2cm.  Also  shown  on  the  figure  is  the  estimated  re¬ 
solvability  along  Z  based  on  the  singular  value  along  Z  of  (16).  Since  resolvability  varies  continu¬ 
ously  across  camera  intrinsic  and  extrinsic  parameters,  the  trials  will  not  give  a  completely  accurate 
picture  of  the  instantaneous  resolvability  for  each  of  the  three  initial  camera  configurations  based 
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Figure  21:  Change  in  object  depth  required  to  effect  a  10  pixel  change  in  projected  object  length  for  three 

different  focal  lengths  at  an  initial  depth  of  l.Om. 

on  change  in  object  depth.  It  can  be  seen,  however,  that  there  is  general  agreement  in  the  relative 
change  in  the  calculated  resolvability  and  the  relative  change  in  the  difference  in  translation  along 
Z 

Figure  21  shows  translation  motion  along  Z  when  the  object  servos  form  the  same  initial  depth  but 
with  a  different  focal  length.  All  trials  were  run  from  an  initial  object  depth  of  Im.  For  a  focal 
length  of  15mm,  the  depth  change  was  19.6cm,  for  30mm  the  depth  change  was  15.7cm,  and  for 
75mm  the  depth  change  was  7.7cm.  It  should  be  noted  that  the  zoom  lens  was  not  calibrated.  The 
focal  lengths  are  estimates  obtained  from  the  scale  inscribed  by  the  manufacturer  on  the  lens  body, 
and  is  undoubtedly  highly  inaccurate.  Nevertheless,  the  trends  one  would  expect  in  resolvability  at 
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various  focal  lengths  versus  required  changes  in  depth  to  effect  a  lOpixel  change  in  feature  distance 
are  apparent. 

8.  Conclusion 

The  directional  nature  of  the  resolvability  ellipsoid  makes  this  sensor  placement  criterion  particu¬ 
larly  useful  for  guiding  visual  sensor  motion  in  real-time,  or  as  an  aid  in  determining  the  placement 
of  static  sensors.  In  this  paper  we  have  shown  that  resolvability  can  effectively  represent  the  ability 
of  several  different  sensor  configurations  to  resolve  translational  and  rotational  positions  in  objects 
being  observed.  For  visual  servoing,  this  concept  directly  relates  to  the  accuracy  with  which  a  ma¬ 
nipulator  can  move  an  object  to  some  desired  goal  position  and  orientation.  Experimental  results 
using  an  uncalibrated  zoom  lens  demonstrate  that  resolvability  can  be  used  to  direct  camera-lens 
motion  in  particular  directions  in  order  to  increase  the  ability  of  a  visually  servoed  manipulator  to 
perform  precise  manipulation  tasks. 
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